Lexical access for large-vocabulary speech recognition

نویسندگان

Roger Ho-Yin Leung

Hong C. Leung

چکیده

In this paper, the lexical characteristics of two Chinese dialects and American English are explored. Different lexical representations are investigated, including the tonal syllables, base syllables, phonemes, and the broad phonetic classes. Multiple measurements are made, such as coverage, uniqueness, and cohort sizes. Our results are based on lexicons of 44K and 52K words in Chinese and English obtained from the CallHome Corpus and the COMLEX Corpus, respectively. We have found that the set of the most frequent 4,000 words has coverage of 92% and 77% for Chinese and English, respectively. The phonetic representation unique specifies 85%, 87% and 93% of the lexicon for Mandarin, Cantonese, and English, respectively. While the three languages appear quite different when they are described by their full phoneme sets, their characteristics are more similar when they are represented in terms of broad phonetic classes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise

Vocabulary size has been suggested as a useful measure of "verbal abilities" that correlates with speech recognition scores. Knowing more words is linked to better speech recognition. How vocabulary knowledge translates to general speech recognition mechanisms, how these mechanisms relate to offline speech recognition scores, and how they may be modulated by acoustical distortion or age, is les...

متن کامل

Modeling Lexical Tones for Mandarin Large Vocabulary Continuous Speech Recognition

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

A New Decoder Design For Large Vocabula

An important problem in large vocabulary speech recognition for agglutinative languages like Turkish is the high out of vocabulary (OOV) rate caused by extensive number of distinct words. Recognition systems using words as the basic lexical elements have difficulty in dealing with such virtually unlimited vocabulary. We propose a new time-synchronous lexical tree decoder design using morphemes ...

متن کامل

Recognition of out-of-vocabulary words with sub-lexical language models

A major source of recognition errors, out-of-vocabulary (OOV) words are also semantically important; recognizing them is, therefore, crucial for understanding. Success, so far, has been modest, even on very constrained tasks. In this paper we present a new approach to unlimited vocabulary speech recognition based on using graphemeto-phoneme correspondences for sub-lexical modeling of OOV words,...

متن کامل

The influence of lexical-access ability and vocabulary knowledge on measures of speech recognition in noise.

OBJECTIVE The main objective was to investigate the effect of linguistic abilities (lexical-access ability and vocabulary size) on different measures of speech-in-noise recognition in normal-hearing listeners with various levels of language proficiency. DESIGN Speech reception thresholds (SRTs) were measured for sentences in steady-state (SRTstat) and fluctuating noise (SRTfluc), and for digi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Lexical access for large-vocabulary speech recognition

نویسندگان

چکیده

منابع مشابه

Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise

Modeling Lexical Tones for Mandarin Large Vocabulary Continuous Speech Recognition

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

A New Decoder Design For Large Vocabula

Recognition of out-of-vocabulary words with sub-lexical language models

The influence of lexical-access ability and vocabulary knowledge on measures of speech recognition in noise.

عنوان ژورنال:

اشتراک گذاری